Spatial R Course
Ed Carnell & Phil Taylor
October 2019
This script uses the following packages:
sf is used for processing vector data (points, polygons, lines)raster is used for processing gridded data sets (rasters)ggplot2 is used for producing static maps and graphsleaflet is used for producing interactive mapsRColorBrewer provides a good range of colour schemes for mapsThe following lines load these required packages using the function require(package_name).
N.B. Packages need to be installed before they are loaded, the function install.packages("package_name") is used to install new packages.
#install.packages("sf")
#install.packages("raster")
#install.packages("leaflet")
#install.packages("RColorBrewer")
#install.packages("ggmap")
#install.packages("leafpop")
require(ggplot2)
require(raster)
require(leaflet)
require(RColorBrewer)
require(sf)
require(ggmap)
require(leafpop)
To the load in the file sent to us by Sovaia, we need to specify the location of the file.
There are two ways to do this
You could set a ‘working directory’ and specify the directory in which all your files are stored. To set a working directory the setwd() function is used. Once set, R will assume all specified files are stored in your a working directory.
You can specify the location of each file as you load them. I prefer this method of working as I typically need to access files across multiple working directories
# Set shared project folder
prj_dir <- "C:/Link/To/Your/Data/Folder/"
With the project folder specified, we can concatenate the file path and file name together using paste0() (This is the equivalent of the Excel function “Concatenate”). If you need to add in a separator between text strings the function paste() can be used. For example, if you wanted to add a space and comma between a list of names you’d use paste(c("Ed","Phil","Sovaia"),sep=", "), which would produce the result Ed, Phil, Sovaia
# Join file name to directory
fn <- paste0(prj_dir,"Points/Fiji_field_sites.csv")
# Print output
print(fn)
## [1] "C:/Link/To/Your/Data/Folder/Points/Fiji_field_sites.csv"
The csv file sent to us by Sovaia can be loaded using read.csv(). For (very) large .csv files the fread() function in the package data.table is extremely useful (I don’t use that here as it’s less intuitive if you’re not familiar with data.table)
# Read in csv from Sovaia
csv <- read.csv(fn)
# Visually inspect the data file
csv
## Site_ID Site_Name Research_Activity Personnel x_coord y_coord
## 1 1 Ba Goby 4 1888681 3934831
## 2 2 Lami Skink 12 1963448 3880167
## 3 3 Korolevu Hibiscus 2 1905213 3907123
## 4 4 Dakuni Turtles 5 1937862 3847929
## 5 5 Navua Skink 15 1938443 3869094
## 6 6 Suva HQ 47 1966864 3874481
## 7 7 Malevu Turtles 14 1838628 3983464
## 8 8 Tonge Hibiscus 4 1859599 3966160
## 9 9 Koroyanitu Skink 5 1877548 3929012
## 10 10 Voma A Hibiscus 2 1932943 3892168
## 11 11 Voma B Goby 4 1935578 3893486
## 12 12 Voma C Skink 3 1935048 3889403
## 13 13 Vitawa Turtles 8 1932105 3958886
## 14 14 Nayavutoka Turtels 2 1963082 3939379
## 15 15 Sigatoka Goby 11 1871212 3874765
You’ll notice that the spelling mistake of "Turtels" in the Research_Activity column is still there. To correct that we can use gsub(), this function as is the equivalent of “Find + Replace” in Microsoft Word & Excel
# Correct spelling error of 'Turtels'
csv$Research_Activity <- gsub(pattern = "Turtels",
replacement = "Turtles",
x = csv$Research_Activity)
We can check that has worked by returning all the unique values in the column “Research_Activity”. The unique() function is used to return all unique values (numbers/strings).
# Check error is fixed
unique(csv$Research_Activity)
## [1] "Goby" "Skink" "Hibiscus" "Turtles" "HQ"
With the spelling mistake corrected, we can now add spatial features to the data (using the coordinate columns)
data.frame to sfTo create a spatial object from data we’ve been sent, we need to convert the current data.frame into an sf object (sf stands for simple features and is a format used by R to handle spatial data).
To convert between the two object types we use the function st_as_sf(). The function st_as_sf requires:
x : an object to be converted into an sf object (in this example a data.frame)
coords : The names or numbers of the numeric columns holding coordinates (N.B. multiple coordinate can be supplied for polygon data, but this is a less common use of the function)
crs : coordinate reference system (in this case Fiji Map Grid - ESPG 3460)
# Convert data.frame to "simple features" object
# Coordinate Reference System = Fiji Map Grid (ESPG 3460)
sf_pnt <- st_as_sf(x = csv,
coords = c("x_coord","y_coord"),
crs = 3460)
# See what that looks like
sf_pnt
## Simple feature collection with 15 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1838628 ymin: 3847929 xmax: 1966864 ymax: 3983464
## epsg (SRID): 3460
## proj4string: +proj=tmerc +lat_0=-17 +lon_0=178.75 +k=0.99985 +x_0=2000000 +y_0=4000000 +ellps=WGS72 +towgs84=0,0,4.5,0,0,0.554,0.2263 +units=m +no_defs
## First 10 features:
## Site_ID Site_Name Research_Activity Personnel geometry
## 1 1 Ba Goby 4 POINT (1888681 3934831)
## 2 2 Lami Skink 12 POINT (1963448 3880167)
## 3 3 Korolevu Hibiscus 2 POINT (1905213 3907123)
## 4 4 Dakuni Turtles 5 POINT (1937862 3847929)
## 5 5 Navua Skink 15 POINT (1938443 3869094)
## 6 6 Suva HQ 47 POINT (1966864 3874481)
## 7 7 Malevu Turtles 14 POINT (1838628 3983464)
## 8 8 Tonge Hibiscus 4 POINT (1859599 3966160)
## 9 9 Koroyanitu Skink 5 POINT (1877548 3929012)
## 10 10 Voma A Hibiscus 2 POINT (1932943 3892168)
In order to compare the Fijian points to other data sets, we need to change the coordinate system from Fiji Map Grid. WGS84 is commonly used for global data sets
# Project coordinates to WGS84 (longitude, latitude)
sf_pnt_wgs84<-st_transform(x = sf_pnt,
crs = 4326)
# Check that's worked
sf_pnt_wgs84
## Simple feature collection with 15 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 177.2335 ymin: -18.37327 xmax: 178.437 ymax: -17.14371
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## Site_ID Site_Name Research_Activity Personnel
## 1 1 Ba Goby 4
## 2 2 Lami Skink 12
## 3 3 Korolevu Hibiscus 2
## 4 4 Dakuni Turtles 5
## 5 5 Navua Skink 15
## 6 6 Suva HQ 47
## 7 7 Malevu Turtles 14
## 8 8 Tonge Hibiscus 4
## 9 9 Koroyanitu Skink 5
## 10 10 Voma A Hibiscus 2
## geometry
## 1 POINT (177.7013 -17.58611)
## 2 POINT (178.4048 -18.08256)
## 3 POINT (177.8558 -17.83724)
## 4 POINT (178.1621 -18.37327)
## 5 POINT (178.1682 -18.18204)
## 6 POINT (178.437 -18.134)
## 7 POINT (177.2335 -17.14371)
## 8 POINT (177.4294 -17.30143)
## 9 POINT (177.5961 -17.63811)
## 10 POINT (178.117 -17.97339)
You can see the values in geometry have changed and they are now in long/lat
Now that we have the points in long/lat, we can save this as a Shapefile or GeoPackage. We’ll save it in the same folder as the original prj_dir
# st write (used to save sf objects)
st_write(obj = sf_pnt_wgs84,
dsn = paste0(prj_dir,"Points/fiji_site_locations_wgs84.shp"),
delete_dsn = T)
## Deleting source `C:/Link/To/Your/Data/Folder/Points/fiji_site_locations_wgs84.shp' using driver `ESRI Shapefile'
## Writing layer `fiji_site_locations_wgs84' to data source `C:/Link/To/Your/Data/Folder/Points/fiji_site_locations_wgs84.shp' using driver `ESRI Shapefile'
## Writing 15 features with 4 fields and geometry type Point.
We can also quickly plot the newly projected data using the base plot() function
# Plot locations
plot(x = sf_pnt[,"Research_Activity"],
pch = 16, # pch = spoint style & 16 = filled circle
key.pos = 1) # posistion of legend. 1 = bottom
ggplot2The base plot() function is useful for quickly checking data, but for reports etc ggplot() tends to produce nicer outputs.
ggplotggplot() is really flexible as virtually every aspect of any graph can be edited. To start with, let’s create a black ggplot() object and add the projected points and colour these based on research activity.
#Create blank ggplot2 object
p <- ggplot()
#Add points
p <- p+geom_sf(data=sf_pnt_wgs84,
aes(col=Research_Activity),
size=2,
show.legend = "point")
#show output
p
It’s pretty hard to see where these points are located without having anything to compare them to.
Let’s load in the administrative boundaries and add these to the plot p
We can also specify the region we want to plot (using coord_sf) and change the colour scheme of the points (using scale_colour_brewer - a list of colour ramps can be found here, I’ve used Set1 as the data we are plotting is qualitative)
sf_poly <- st_read(dsn = paste0(prj_dir,"Polygons/fji_polbnda_adm2_province.shp"),
quiet=T)
sf_poly_wgs84 <- st_transform(x = sf_poly,
crs = 4326)
#Create blank ggplot2 object
p <- ggplot()
# Add polygon data
p <- p+geom_sf(data = sf_poly_wgs84,
col = "black",
fill = "green",
alpha = 0.5)
# Add point data
p <- p + geom_sf(data = sf_pnt_wgs84,
aes(col = Research_Activity),
size = 3,
show.legend = "point")
# Zoom into area of interest
p <- p + coord_sf(xlim = c(177, 179),
ylim = c(-18.5, -17))
# Add colour scale
p <- p + scale_colour_brewer(palette = "Set1",
name = "Activity")
p
I still find the map above pretty ugly, but fortunately you can edit the theme() in ggplot() to your own tastes
You can specify any colour you like in R, by using hex colours. The below below is "#75cbe0". If you Google hex colour picker a handy tool comes up for picking colours.
# Create a new object, so as not to overwrite the plot above (p2)
# Make the grid lines even (0.5 x 0.5 degree)
p2 <- p + scale_x_continuous(breaks = seq(177,179,0.5),
expand = c(0,0),
limits = c(177,179))
p2 <- p2 + scale_y_continuous(breaks = seq(-18.5,17,0.5),
expand = c(0,0),
limits = c(-18.5,-17))
# Remove grey background
p2 <- p2 + theme(panel.background = element_rect(colour = "black",
fill = "#d9eefa"),
# Add grey gridlines
panel.grid.major = element_line(colour = "grey"),
# Remove grey background in legend
legend.key = element_blank(),
# Adjust legend text size
legend.text = element_text(size=12,colour="black"),
legend.title = element_text(size=14,colour="black",face="bold"))
p2
Or you could just make something garish, it’s up to you
# Remove grey background
p3 <- p2 + theme(panel.background = element_rect(colour = "#40ff00",
fill = "#ff00c8",
size = 4),
# Add grey gridlines
panel.grid.major = element_line(colour = "#ff6f00"),
# Remove grey background in legend
legend.key = element_rect(colour = "#40ff00",
fill = "#ff00e1",
size = 2),
# Adjust legend text size
legend.text = element_text(size = 17,
colour = "#ff001e",
family = "sans"),
legend.title = element_text(size=14,
colour = "#ff3c00",
face = "italic",
family="serif"),
axis.text = element_text(size=16,
colour = "#8400ff",
face = "bold",
family = "mono"))
p3
ggmapIt’s Sometimes useful to add in base/background maps to check the verify the locations of your points/polygons or to produce nice looking maps for reports. In R we can download base maps using ggmap(). Unfortunately it’s now slightly complicated to add in Google base maps using ggmap as Google have added in the requirement for an API, which requires you to register and link your account to a credit-card. We can however, use third-party maps such as stamenmap
First we define a bounding box of the area we’re interested in (using lat/lon coordinates in WGS84). Then we plot the downloaded tiles using ggmap() and then add our point data in the same way as the ggplot() example.
# fetch basemap tiles of the study area
fiji <- get_stamenmap(bbox = c(left = 177,bottom = -18.5,
right = 179,top = -17),
zoom = 9,
maptype = "terrain")
# create a ggmap object using the downloaded tiles
p <- ggmap(fiji)
# Add points (as before in the ggplot example)
p <- p + geom_sf(data = sf_pnt_wgs84,
aes(col = Research_Activity),
size = 3,
show.legend = "point",
inherit.aes = FALSE)
p <- p + scale_colour_brewer(palette="Set1",
name="Activity")
# View outputs
p
leafletUnlike QGIS, the above two examples are static maps.
Leaflet is a useful package that allows you to explore spatial data in a similar way to QGIS
leaflet mapFirst we need to create a leaflet object and add a base map (“Provider tile”). The %>% is called a pipe and is useful for linking functions together without having to assign the result each time
# Create leaflet object
lf <- leaflet() %>%
# Add satellite imagery
addProviderTiles(provider = "Esri.WorldImagery",
layerId = "base") %>%
# Zoom into fiji
setView(lng = 177.8, lat= -17.8, zoom = 8)
lf
leafletThere’s also a load of other base maps (a full list of which can be found here) and includes this revolting number:
lf %>%
removeTiles(layerId = "base") %>%
addProviderTiles(provider = "Stamen.Watercolor",
layerId = "base")
# or this ESRI relief layer
lf %>%
removeTiles(layerId = "base") %>%
addProviderTiles(provider = "Esri.WorldShadedRelief",
layerId = "base")
leafletTo colour the points in leaflet we need to create a colour palette that has all the research activities in. We do this by using unique() again and brewer.pal. Also with leaflet maps you need to specify that you would like to add a legend to the map.
# Return all unique Research Activities
activities <- unique(sf_pnt_wgs84$Research_Activity)
# Assign a colour for each Research Activity
cols <- brewer.pal(length(activities), "Set1")
#Create Leaflet colour palette
pal <- colorFactor(cols, domain = activities)
# Add to leaflet object
lf2 <-lf %>%
# Add markers for each activity
addCircleMarkers(data = sf_pnt_wgs84,
color = ~pal(Research_Activity),
stroke = FALSE,
fillOpacity = 0.7,
radius = 6) %>%
# Add a legend to the map
addLegend(pal = pal,
title = "Activity",
values = sf_pnt_wgs84$Research_Activity,
layerId = "legend",
position = "bottomright")
lf2
We can also add in pop-ups of selected information when you click on the points. N.B. this popup = argument can be added to the addCircleMarkers above, but I thought I’d simplify the code a little
# Add to the latest leaflet object
lf3<-lf2 %>%
# Add markers for each activity
addCircleMarkers(data = sf_pnt_wgs84,
color = ~pal(Research_Activity),
stroke = FALSE,
fillOpacity = 0.7,
radius = 6,
popup = ~Site_Name)
lf3
Another common form of spatial data is gridded data (known as Raster data).
Raster data uses a regular grid of points to represent the data. This is data like a digital photo, where each pixel (or grid cell) is assigned a value (such as elevation or temperature). Since the grid is regular, the x and y coordinates do not need to be stored for each point. This makes it much quicker to work with compared with Point/Polygon/Line data, which is collectively known as vector data. Polygon data such as the Fiji boundary data, is typically larger and takes longer to visualise as it’s features can be irregular shapes and every feature has it’s own string of coordinates attached to it.
Ordnance Survey have written a more detailed comparison between vector and raster data can be found here
We’ll load in some global altitude data at 1 km x 1km grid resolution
# load data
r <- raster(x = paste0(prj_dir,"Raster/elevation_1KMmd_GMTEDmd.tif"))
# plot
plot(r)
# return raster information
r
## class : RasterLayer
## dimensions : 16800, 43200, 725760000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -180, 180, -56, 84 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Link/To/Your/Data/Folder/Raster/elevation_1KMmd_GMTEDmd.tif
## names : elevation_1KMmd_GMTEDmd
## values : -428.5, 8650.5 (min, max)
This file has 725,760,000 values and plots in seconds! Rasterizing data is really useful, especially when you’re working with global data or national data at a high spatial resolution. The CEH Land Cover data set for example has over 1 billion grid cells and provides land cover information for every 25 m x 25 m grid cell in the UK.
The summary information of the raster seems to make sense. The lowest grid cell has a value of 428.5 & the highest value is 8,650.5 m. Although Mt. Everest is 8,848 m high, gridded data only provides one value for each cell (here 1 km x 1 km) so often doesn’t capture extremes in data (in the same way that a spot-height would).
Sometimes you may want to lower the spatial resolution of a raster. You may want to do this for the following reasons:
Regional vs Local: If you wanted to use the 25 m x 25 m CEH land cover data to produce a map of dominant land-covers in the UK, plotting the original would take a while and would be pointless as you’d be unable to distinguish between the values of > 1 billion cells. In this case it would make sense to increase the spatial resolution to 1 km x 1 km and visualise the most common (modal) value in the 1,600 individual (25 m x 25 m) cells that make up the 1 km x 1km grid.
Speed: Some operations do take a while to run over millions (or billions) of cells, so if you can get away with using a lower-resolution data it can speed things up
Alternatively as we’re only concerned with a tiny proportion of the global data, we can just crop the raster data to our Fijian study area. Rather than aggregating the resolution.
Let’s crop the raster data and compare the 1km elevation data to mean elevation at a grid resolution of 10 km x 10 km
#crop raster to study area
r_crp <- crop(x = r, y = extent(x = 177,
xmax = 179,
ymin = -18.5,
ymax = -17))
#plot 1km cropped data
plot(r_crp)
#Estimate mean elevation at 10km grid resolution
r_10km <- aggregate(x = r_crp,
fact = 10,
fun = mean)
#plot 10km cropped data
plot(r_10km)
We can also extract values of a raster using vector data.
For example, we could estimate the elevation at each of our study locations. We’ll use the original 1km raster as the aggregated one looks a bit coarse for our purposes
elv <- extract(r_crp,sf_pnt_wgs84)
# Add results to sf object
sf_pnt_wgs84$Elevation <- elv
sf_pnt_wgs84[,c("Site_Name","Research_Activity","Elevation")]
## Simple feature collection with 15 features and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 177.2335 ymin: -18.37327 xmax: 178.437 ymax: -17.14371
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## Site_Name Research_Activity Elevation geometry
## 1 Ba Goby 12.0 POINT (177.7013 -17.58611)
## 2 Lami Skink 142.5 POINT (178.4048 -18.08256)
## 3 Korolevu Hibiscus 672.0 POINT (177.8558 -17.83724)
## 4 Dakuni Turtles 0.0 POINT (178.1621 -18.37327)
## 5 Navua Skink 27.5 POINT (178.1682 -18.18204)
## 6 Suva HQ 48.5 POINT (178.437 -18.134)
## 7 Malevu Turtles 0.0 POINT (177.2335 -17.14371)
## 8 Tonge Hibiscus 0.0 POINT (177.4294 -17.30143)
## 9 Koroyanitu Skink 524.5 POINT (177.5961 -17.63811)
## 10 Voma A Hibiscus 483.0 POINT (178.117 -17.97339)
The elevation height for Site_Name: Tonge (line 8) seems to be a bit odd. Hibiscus should be inland.
Let’s Google where the Tonge site is and check that it corresponds with a place called Tonge in Fiji
## Coordinates
st_coordinates(sf_pnt_wgs84[sf_pnt_wgs84$Site_Name=="Tonge",])
## X Y
## 1 177.4294 -17.30143
N.B. Google Maps assumes coordinates are in Lat/Long (y/x) rather than Long/Lat (x/y) that we currently have
Oh no! This research station appears to be in the sea, which seems like an odd location for Hibiscus.
Let’s replace the values with 177.740455, -17.628975 (the real location of Tonge, Fiji)
# Update coordinates
st_geometry(sf_pnt_wgs84[sf_pnt_wgs84$Site_Name=="Tonge",]) <- st_sfc(st_point(c(177.740455, -17.628975)))
# Repeat extraction of elevation
sf_pnt_wgs84$Elevation <- extract(r_crp,sf_pnt_wgs84)
sf_pnt_wgs84[sf_pnt_wgs84$Site_Name=="Tonge",c("Site_Name","Research_Activity","Elevation")]
## Simple feature collection with 1 feature and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 177.7405 ymin: -17.62898 xmax: 177.7405 ymax: -17.62898
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## Site_Name Research_Activity Elevation geometry
## 8 Tonge Hibiscus 22 POINT (177.7405 -17.62898)
# raster palette
r_pal <- colorNumeric(c("#0C2C84", "#41B6C4", "#FFFFCC"), values(r_crp),
na.color = "transparent")
# Create leaflet object
leaflet() %>%
# Add satellite imagery
addProviderTiles(provider = "OpenStreetMap.Mapnik",
layerId = "xx") %>%
# Zoom into fiji
setView(lng = 177.8, lat= -17.8, zoom = 8) %>%
# Add a raster to the map
addRasterImage(x = r_crp,
colors = r_pal,
opacity = 0.5) %>%
addLegend(pal = r_pal,
values = values(r_crp),
position = "bottomright",
title = "Elevation") %>%
# Add markers for each activity
addCircleMarkers(data = sf_pnt_wgs84,
color = ~pal(Research_Activity),
stroke = FALSE,
fillOpacity = 0.5,
radius = 6,
label = ~Site_Name,
popup = popupTable(x = sf_pnt_wgs84%>%st_set_geometry(NULL),
#Remove geometry from sf object (it looks ugly otherwise)
feature.id=F,
row.numbers = F)) %>%
# Add a legend to the map
addLegend(pal = pal,
title = "Activity",
values = sf_pnt_wgs84$Research_Activity,
layerId = "legend",
position = "topleft")